In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd

Data Prepartion This lab will load three dataset: The Seattle Police Department Beats, Seattle Neighorhoods, and current crime data in Seattle.

Here are a definition for each dataset and a link for more details.

[Seattle Police Department Beats]: The Seattle Police Department has different iterations of police beats as the city's needs evolved. This layer represents what the beats looked like from 2017 to present. [Link] (https://data-seattlecitygis.opendata.arcgis.com/datasets/SeattleCityGIS::seattle-police-department-beats-web-mercator/about) [Seattle Neighborhoods]: -Neighborhood map atlas neighborhood areas are derived from the Seattle City Clerk's Office Neighborhood Map Atlas. These are the smallest areas and have been supplemented with alternate names from other sources. They roll up to the district areas. (https://data-seattlecitygis.opendata.arcgis.com/datasets/SeattleCityGIS::neighborhood-map-atlas-neighborhoods/about Links to an external site.) [SPD Crime Data 2008 - Present]: Seattle Police Department stored the crime data from 2008 to current in the National Incident-Based Reporting System (NIBRS). The data includes report date time, block address, latitude, longitude, offense sub category, and more. (https://data.seattle.gov/Public-Safety/SPD-Crime-Data-2008-Present/tazs-3rd5/about_data Links to an external site.) The crime data are updated by having more records every time, yet beats and neighborhoods data are likely constant over time. It means that the crime data from 2008 to current is huge and the API endpoint returns a limited number of records. In my experience, this crime data API endpoint returns about 2000 records. Due to the limited number of records pull, we will query crime data by date to only returns the recent records. Otherwise, it will return random records in the dataset and it would not be meaningful to analyze them.

We will use the API endpoint to GeoJson to load Beats and Neighborhoods.

In [2]:
beats=gpd.read_file("https://services.arcgis.com/ZOyb2t4B0UYuYNYH/arcgis/rest/services/Current_Beats/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson")
In [3]:
nbh=gpd.read_file("https://services.arcgis.com/ZOyb2t4B0UYuYNYH/arcgis/rest/services/nma_nhoods_sub/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson")

For the crime data, we will query data first and generate the API endpoint. Crime Data

In [4]:
url = "https://data.seattle.gov/resource/tazs-3rd5.json?$limit=2000&$where=report_date_time%20between%20%272025-11-06%27%20and%20%272025-11-11%27"

incidents = pd.read_json(url)

incidents.head()
Out[4]:
report_number report_date_time offense_id offense_date nibrs_group_a_b nibrs_crime_against_category offense_sub_category shooting_type_group block_address latitude longitude beat precinct sector neighborhood reporting_area offense_category nibrs_offense_code_description nibrs_offense_code
0 2025-325462 2025-11-06 00:11:39 67248678577 2025-11-05T22:47:00.000 A PERSON ASSAULT OFFENSES - 11XX BLOCK OF SUMMIT AVE 47.61133751 -122.32389709863 E3 East E FIRST HILL 7623 ALL OTHER Intimidation 13C
1 2025-325470 2025-11-06 00:15:06 67217550137 2025-11-05T23:14:00.000 B SOCIETY DISORDERLY CONDUCT & VAGRANCY VIOLATIONS - 15XX BLOCK OF 3RD AVE W 47.63292666 -122.361015838569 Q2 West Q QUEEN ANNE 8623 ALL OTHER Curfew/Loitering/Vagrancy Violations 90B
2 2025-325470 2025-11-06 00:15:06 67217554931 2025-11-05T23:14:00.000 B ANY ALL OTHER - 15XX BLOCK OF 3RD AVE W 47.63292666 -122.361015838569 Q2 West Q QUEEN ANNE 8623 ALL OTHER All Other Offenses 90Z
3 2025-325470 2025-11-06 00:15:06 67217557249 2025-11-05T23:14:00.000 A SOCIETY NARCOTIC VIOLATIONS (INCLUDES DRUG EQUIP.) - 15XX BLOCK OF 3RD AVE W 47.63292666 -122.361015838569 Q2 West Q QUEEN ANNE 8623 ALL OTHER Drug/Narcotic Violations 35A
4 2025-325482 2025-11-06 00:55:17 67217736640 2025-11-05T23:58:00.000 A SOCIETY NARCOTIC VIOLATIONS (INCLUDES DRUG EQUIP.) - 6TH AVE S / S DEARBORN ST 47.59583728 -122.326373794937 K3 West K CHINATOWN/INTERNATIONAL DISTRICT 1499 ALL OTHER Drug/Narcotic Violations 35A
In [5]:
incidents['longitude'].describe()
Out[5]:
count          952
unique         628
top       REDACTED
freq           163
Name: longitude, dtype: object
In [6]:
incidents['latitude'].describe()
Out[6]:
count          952
unique         616
top       REDACTED
freq           163
Name: latitude, dtype: object
In [7]:
incidents2 = incidents[(incidents['longitude'] != 'REDACTED')]
In [8]:
incidents2 = incidents2.astype({'longitude': float, 'latitude':float})
In [9]:
# Geopandas.GeoDataFrame() method will conver the data to spatial form, Geo dataframe with points_from_xy method.
spd_crime=gpd.GeoDataFrame(incidents2, geometry=gpd.points_from_xy(incidents2.longitude, incidents2.latitude), crs='EPSG:4326')
In [10]:
spd_crime.plot()
Out[10]:
<Axes: >
No description has been provided for this image
In [11]:
spd_crime['longitude'].describe()
Out[11]:
count    789.000000
mean    -122.022878
std        6.104854
min     -122.408242
25%     -122.347592
50%     -122.327014
75%     -122.313407
max       -1.000000
Name: longitude, dtype: float64
In [12]:
spd_crime2 = spd_crime[(spd_crime['longitude'] != -1)]
In [13]:
spd_crime2.plot()
Out[13]:
<Axes: >
No description has been provided for this image
In [14]:
spd_crime3 = spd_crime2[[ 'report_number','report_date_time','offense_id','offense_category' ,'latitude', 'longitude','geometry']]
spd_crime3
Out[14]:
report_number report_date_time offense_id offense_category latitude longitude geometry
0 2025-325462 2025-11-06 00:11:39 67248678577 ALL OTHER 47.611338 -122.323897 POINT (-122.3239 47.61134)
1 2025-325470 2025-11-06 00:15:06 67217550137 ALL OTHER 47.632927 -122.361016 POINT (-122.36102 47.63293)
2 2025-325470 2025-11-06 00:15:06 67217554931 ALL OTHER 47.632927 -122.361016 POINT (-122.36102 47.63293)
3 2025-325470 2025-11-06 00:15:06 67217557249 ALL OTHER 47.632927 -122.361016 POINT (-122.36102 47.63293)
4 2025-325482 2025-11-06 00:55:17 67217736640 ALL OTHER 47.595837 -122.326374 POINT (-122.32637 47.59584)
... ... ... ... ... ... ... ...
945 2025-330622 2025-11-10 22:58:50 67307792849 ALL OTHER 47.690621 -122.374103 POINT (-122.3741 47.69062)
946 2025-330622 2025-11-10 22:58:50 67306450951 ALL OTHER 47.690621 -122.374103 POINT (-122.3741 47.69062)
948 2025-330645 2025-11-10 23:23:05 67306954301 ALL OTHER 47.613375 -122.346513 POINT (-122.34651 47.61338)
950 2025-330661 2025-11-10 23:40:38 67307191880 PROPERTY CRIME 47.647728 -122.400922 POINT (-122.40092 47.64773)
951 2025-330610 2025-11-10 23:41:46 67307019385 PROPERTY CRIME 47.609604 -122.329221 POINT (-122.32922 47.6096)

787 rows × 7 columns

In [15]:
import folium
In [16]:
map = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)
map
Out[16]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [17]:
map = folium.Map(location=[47.6577792,-122.3031197], tiles="CartoDB Positron", zoom_start=10)
map
Out[17]:
Make this Notebook Trusted to load map: File -> Trust Notebook

We can pass GeoJson object with folium.GeoJson() method and a cool feature of geopandas is that we can pass geodataframe as geojson in Folium to make a map. The following code will create a Seattle neighborhood map with popup displaying a value in the 'L_HOOD' column. Note that the following code makes a map object first, it makes a layer with the nbh data, and it is added using add_to() method. Lastly, when map1 is called, map1 is displaed as an output.

In [18]:
map1 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)
folium.GeoJson(nbh,
            popup=folium.GeoJsonPopup(fields=['L_HOOD'])   
              ).add_to(map1)
map1
Out[18]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [19]:
map1.save("index.html")

map2 displaying the SPD beats data with popup with a beat name.

In [20]:
map2 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)
folium.GeoJson(beats,
              popup=folium.GeoJsonPopup(fields=['beat'])  
              ).add_to(map2)
map2
Out[20]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [21]:
spd_crime3['report_date_time'] = spd_crime3['report_date_time'].dt.strftime('%Y-%m-%d %H:%M:%S')
map3 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)
folium.GeoJson(spd_crime3).add_to(map3)
map3
/opt/conda/lib/python3.11/site-packages/geopandas/geodataframe.py:1968: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
Out[21]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Incidents by category¶

Now we will classify data by offense category to map the crime reports by its category. We will first find a list of offense type and next, assign a color to each offense type in the list.

In [22]:
offense_type = spd_crime3.offense_category.unique().tolist()
offense_type
Out[22]:
['ALL OTHER', 'PROPERTY CRIME', 'VIOLENT CRIME']
In [23]:
colors = {'PROPERTY CRIME':'red', 'VIOLENT CRIME': 'darkpurple', 'ALL OTHER':'lightblue'}

Now we have three types of the reported offense and assigned color codes to the 'color' object. The following code iterates records in the geodataframe, check the offense category, and assign a color in colors to a marker if a value in the spd_crime3 record equals to a key value in colors.

In [24]:
map4 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)

# Add markers with colors based on category
for index, row in spd_crime3.iterrows():
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=row['offense_category'],
        icon=folium.Icon(color=colors.get(row['offense_category'], 'gray')) # Default to gray if category not found
    ).add_to(map4)

map4
Out[24]:
Make this Notebook Trusted to load map: File -> Trust Notebook

First, let's see incidents from today. The following code makes an object holding today's date value. Working with datetime data type is a bit tricky as we need to think of a standard time and local time (time zone). 'today' will returns today's date but you will see that is not a local time. Let's convert it to a local datetime information. You need to set the time with utc=True to make the time information aware the time zone. Next, tz_convert('America/Los_Angeles') converts it to the time value of Seattle. date() returns the value in date.

In [25]:
curr_date = pd.to_datetime('today', utc=True).tz_convert('America/Los_Angeles').date()
In [26]:
curr_date
Out[26]:
datetime.date(2025, 12, 1)

See the reported incidents today with the next code line. In my case, it does not return any records. It could be because no incidents or no report yet.

In [27]:
incidents_today =  spd_crime3[pd.to_datetime(spd_crime3['report_date_time']).dt.date == curr_date]
In [28]:
incidents_today
Out[28]:
report_number report_date_time offense_id offense_category latitude longitude geometry
In [29]:
map5 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=10)

folium.GeoJson(incidents_today).add_to(map5)

map5
Out[29]:
Make this Notebook Trusted to load map: File -> Trust Notebook

How about incidents reported yesterday? To compute the dates, we use Timedelta. Timedeltas are absolute differences in times, expressed in difference units (e.g. days, hours, minutes, seconds). days=1 in pd.Timedelta method indicates one day duration and when the curr_date is subtracted by pd.Timedelta(days=1), then the date value becomes one in yesterday.

In [30]:
incidents_yesterday =  spd_crime3[pd.to_datetime(spd_crime3['report_date_time']).dt.date == curr_date - pd.Timedelta(days=1)]
In [31]:
incidents_yesterday
Out[31]:
report_number report_date_time offense_id offense_category latitude longitude geometry
In [32]:
from datetime import date

Incidents by Neighborhood by date¶

Here, we will take a look at the total number of incident distribution by dates and examine its spatial distribution of the date having the maximum number of incidents. The process is

  • Identify the incident event distribution by date by grouping incidents by date and counting the number of incident events in each group.
  • From the distribution, find a date with the highest number of incident events.
In [33]:
spd_crime3.loc[:, 'report_date_time'] = pd.to_datetime(spd_crime3['report_date_time']).dt.date
daily_crime_counts = spd_crime3.groupby(['report_date_time'])['offense_id'].count().reset_index()
In [34]:
daily_crime_counts
Out[34]:
report_date_time offense_id
0 2025-11-06 192
1 2025-11-07 114
2 2025-11-08 144
3 2025-11-09 175
4 2025-11-10 162
In [35]:
daily_crime_counts = daily_crime_counts.rename(columns={"offense_id":"n_incidents"})
daily_crime_counts.plot.bar(x='report_date_time', y='n_incidents')
Out[35]:
<Axes: xlabel='report_date_time'>
No description has been provided for this image

From the output dataframe and the bar graph, the date of Nov. 06, 2025 has the highest incident event. Now let's make a Choropleth map to see the spatial distribution of the events on the date.

In [36]:
crime_mmddyy=spd_crime3[pd.to_datetime(spd_crime3['report_date_time']).dt.date == date(2025, 11, 6)]
crime_nbh_mmdd = nbh.sjoin(crime_mmddyy, how='left', predicate='intersects')
crime_nbh_mmdd = crime_nbh_mmdd.dissolve(by='S_HOOD', aggfunc='count').reset_index()
In [37]:
crime_nbh_mmdd = crime_nbh_mmdd[['S_HOOD','geometry','OBJECTID']].rename(columns={"OBJECTID":"n_incidents"})
In [38]:
crime_nbh_mmdd
Out[38]:
S_HOOD geometry n_incidents
0 Alki POLYGON ((-122.38153 47.5895, -122.38182 47.58... 1
1 Arbor Heights POLYGON ((-122.3769 47.51749, -122.376 47.5174... 1
2 Atlantic POLYGON ((-122.30837 47.60167, -122.30401 47.6... 7
3 Ballard POLYGON ((-122.37621 47.6671, -122.37622 47.66... 3
4 Belltown POLYGON ((-122.34264 47.61637, -122.33874 47.6... 6
... ... ... ...
89 West Woodland POLYGON ((-122.37559 47.67593, -122.37329 47.6... 6
90 Westlake POLYGON ((-122.34354 47.62802, -122.34354 47.6... 1
91 Whittier Heights POLYGON ((-122.37634 47.67592, -122.37652 47.6... 1
92 Windermere POLYGON ((-122.27954 47.67578, -122.27736 47.6... 1
93 Yesler Terrace POLYGON ((-122.31678 47.60612, -122.31678 47.6... 2

94 rows × 3 columns

In [39]:
from branca.colormap import linear

colormap = linear.PuRd_05.scale(
    crime_nbh_mmdd.n_incidents.min(),  crime_nbh_mmdd.n_incidents.max()
)

data_dict =  crime_nbh_mmdd.set_index("S_HOOD")["n_incidents"]

map8 = folium.Map(location=[47.6577792,-122.3031197], zoom_start=11)
folium.GeoJson(crime_nbh_mmdd,
    name="incidents",
    style_function=lambda feature: {
        "fillColor": colormap(data_dict[feature['properties']["S_HOOD"]]),
        "color": "black",
        "weight": 1,
        "dashArray": "5, 5",
        "fillOpacity": 0.9,
    },
              popup=folium.GeoJsonPopup(fields=['S_HOOD','n_incidents'])
              ).add_to(map8)

folium.LayerControl().add_to(map8)
colormap.add_to(map8)
map8
Out[39]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Buffers¶

A buffer is the most commonly used proximity analysis. It creates an area around a given feature (point, line, or polygon) and a given distance. When we assign a distance, we have to be careful as the distance unit is euclidean, yet the point data is in WGS 4326 projection. Remind that the point data was generated with longitude and Latitude. The base unit for WGS4326 is degree and it does not match with the euclidean distanct unit. We want to reproject the point layer first in a projection with euclidean unit and assign the buffer distance. This example reproject the incident layer in EPSG 32610, which is WGS84 datum and UTM zone 10N projection.

First I will make point data for the Pioneer Square location with lat lon. I got these coordinates from Open Street Map as this is a default basemap in Folium. Then set its CRS in EPSG:32610 and make a buffer of 500 meter around the Pioneer Square point. Next, the buffer is reprojected again to EPSG 4326 with to_crs() to continue geoprocess with the dataset in this practice.

In [40]:
from shapely.geometry import Point 
In [41]:
ps_loc = Point(-122.3313539, 47.6027217) 
ps = gpd.GeoDataFrame(geometry=[ps_loc], crs=4326)
ps_utm = ps.to_crs('epsg:32610')
ps_buffer = ps_utm.buffer(500)
ps_buffer_4326 = ps_buffer.to_crs('epsg:4326')
In [42]:
ps_buffer_4326
Out[42]:
0    POLYGON ((-122.3247 47.60268, -122.32474 47.60...
dtype: geometry
In [43]:
map9 = folium.Map(location=[47.6008863,-122.3377], zoom_start=14)
folium.GeoJson(
        data=ps_buffer_4326.to_json(),
   ).add_to(map9)
folium.Marker(
        location=[47.6027217,-122.3313539],
  ).add_to(map9)
map9
Out[43]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The buffer polygon was created succefully! We are ready to find the incidents within 500 meter from Pioneer Square. The computation happens with geopandas overlay interaction method. I will put the buffer geometry to geodataframe. Then it will be added to overlay function as the geometry limit to filter the incidents.

In [44]:
ps_buffer_4326 = gpd.GeoDataFrame(geometry=ps_buffer_4326)
In [45]:
crime_ps =  spd_crime3.overlay(ps_buffer_4326, how='intersection')

Interactive Widgets¶

Widgets are eventful python objects that have a representation in the browser, often as a control like a slider, textbox, etc. You can use widgets to build interactive GUIs for your notebooks or to synchronize stateful and stateless information between Python and JavaScript.

Jupyter Widgets is primarily a framework to provide interactive controls. The ipywidgets package also provides a basic, lightweight set of core form controls that use this framework. These included controls include a text area, text box, select and multiselect controls, checkbox, sliders, tab panels, grid layout, etc.

The framework for building rich interactive objects is the foremost purpose of the Jupyter Widgets project, and the set of included core form controls is purposefully kept small and self-contained. We encourage and support a robust ecosystem of packages built on top of the Jupyter Widgets framework to provide more complicated interactive objects, such as maps, 2d and 3d visualizations, or other form control systems built on a variety of popular Javascript frameworks such as Material or Vue.

We have filtered the incident data by location, by polygon, by date, and by an area with the proximity. Here, we will make an interactive map that allows a user can examine the incident data by a neighborhood.

In [46]:
import ipywidgets as widgets
from IPython.display import display

spd_crime3['report_date_time'] = spd_crime3['report_date_time'].astype('str')
map11 = folium.Map(location=[47.6008863,-122.3377], zoom_start=10)
out = widgets.Output(layout={'border': '1px solid black'})

# Create a Dropdown widget
dropdown = widgets.Dropdown(
    options=nbh['S_HOOD'].values.tolist(),
    value=nbh['S_HOOD'].values[0],  # Default selected value
    description='Select',
    disabled=False
)

def on_dropdown_change(change):
    out.clear_output()
    with out:
        map11_n = folium.Map(location=[47.6008863,-122.3377], zoom_start=10)
        my_gdf = spd_crime3.overlay(nbh[nbh['S_HOOD']==dropdown.value], how='intersection')
        layer1 = folium.GeoJson(my_gdf).add_to(map11_n)
        n_mygdf = len(my_gdf)
        print ("The number of crime at "+ dropdown.value + f" is {n_mygdf}")
        display(map11_n)
        

dropdown.observe(on_dropdown_change, names='value')
display(dropdown)

with out:
    display(map11)

out
/opt/conda/lib/python3.11/site-packages/geopandas/geodataframe.py:1968: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
Dropdown(description='Select', options=('Loyal Heights', 'Ballard', 'Whittier Heights', 'West Woodland', 'Phin…
Out[46]:
Output(layout=Layout(border_bottom='1px solid black', border_left='1px solid black', border_right='1px solid b…

Autoplay Heatmap with Timestamp¶

In [47]:
spd_crime3['report_date_time'] = pd.to_datetime(spd_crime3['report_date_time']).sort_values(ascending=True).dt.date
data = []
for _, d in spd_crime3.groupby('report_date_time'):
   data.append([[row['latitude'], row['longitude']] for _, row in d.iterrows()])
/opt/conda/lib/python3.11/site-packages/geopandas/geodataframe.py:1968: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
In [48]:
from datetime import datetime, timedelta

time_index = [
    (min(spd_crime3['report_date_time']) + k * timedelta(1)).strftime("%Y-%m-%d") for k in range(len(data))
]
In [49]:
from folium.plugins import HeatMapWithTime

map12 = folium.Map(location=[47.6008863,-122.3377], zoom_start=10)

hm = folium.plugins.HeatMapWithTime(data, index=time_index, auto_play=True, max_opacity=0.3)

hm.add_to(map12)

display(map12)
Make this Notebook Trusted to load map: File -> Trust Notebook

Now we will query the incidents data for two different 5-day periods this year. I will create a map for each and compare the number of crimes during the time period.

  1. Which period had the highest number of total crimes?
  2. For each time period: which beat shows the highest number of violent crimes?
  3. For each time period: How many properties crime happens on the second last day of the dataset I queried?
In [50]:
url1 = (
    "https://data.seattle.gov/resource/tazs-3rd5.json"
    "?$limit=2000"
    "&$where=report_date_time%20between%20'2025-11-06'%20and%20'2025-11-11'"
)

incidents1 = pd.read_json(url1)
In [51]:
url2 = (
    "https://data.seattle.gov/resource/tazs-3rd5.json"
    "?$limit=2000"
    "&$where=report_date_time%20between%20'2025-10-01'%20and%20'2025-10-05'"
)

incidents2 = pd.read_json(url2)

Total number of crimes in each period

In [52]:
total_p1 = len(incidents1)
total_p2 = len(incidents2)

total_p1, total_p2
Out[52]:
(952, 704)

Beat with highest number of violent crimes in each period

In [53]:
def beat_with_most_violent(df):
    violent = df[df["nibrs_crime_against_category"] == "PERSON"]
    by_beat = violent.groupby("beat")["offense_id"].count().sort_values(ascending=False)
    return by_beat.head(1)   # returns the top beat and count

violent_p1 = beat_with_most_violent(incidents1)
violent_p2 = beat_with_most_violent(incidents2)

violent_p1, violent_p2
Out[53]:
(beat
 K3    11
 Name: offense_id, dtype: int64,
 beat
 D1    9
 Name: offense_id, dtype: int64)

Property crimes on the second-last day of each period

In [54]:
def property_on_second_last_day(df):
    # make a date column
    df = df.copy()
    df["report_date"] = pd.to_datetime(df["report_date_time"]).dt.date

    # keep only property crimes
    prop = df[df["nibrs_crime_against_category"] == "PROPERTY"]

    # sorted unique dates within this subset
    unique_dates = sorted(prop["report_date"].unique())
    if len(unique_dates) < 2:
        return None  # not enough days with property crimes

    second_last_day = unique_dates[-2]

    # count property crimes on that day
    count = (prop["report_date"] == second_last_day).sum()
    return second_last_day, count

prop_p1 = property_on_second_last_day(incidents1)
prop_p2 = property_on_second_last_day(incidents2)

prop_p1, prop_p2
Out[54]:
((datetime.date(2025, 11, 9), 136), (datetime.date(2025, 10, 3), 98))

Now I will analyze one of the datasets from the first part. Uncovering what incident type was most common over the time period I queried and also In which neighborhood was that incident type most prevalent.

Most common incident type during the 5-day period¶

In [55]:
common_type = incidents['offense_category'].value_counts().idxmax()
count_common = incidents['offense_category'].value_counts().max()

common_type, count_common
Out[55]:
('ALL OTHER', 456)

Neighborhood where that incident was most prevalent¶

In [56]:
subset = incidents[incidents['offense_category'] == common_type]

neigh_counts = subset['neighborhood'].value_counts()
top_neigh = neigh_counts.idxmax()
top_neigh_count = neigh_counts.max()

top_neigh, top_neigh_count
Out[56]:
('CAPITOL HILL', 55)

Next, I will create a 750-meter buffer around University of Washington. I will use a University of Washington neighborhood polygon from the neighborhood dataset to make a buffer. And then analyze the crime incidents within the buffer zone:

  1. Create a bar plot to show the number of incidents per day in the time period I queried.
  2. Create a folium map which displays the buffer boundary and all crime incidents within the buffer, with different colored markers for each crime category.

Firstly, load neighborhood polygon + extract UW polygon + buffer 750 meters

In [57]:
# Load Seattle neighborhood polygons
neigh = gpd.read_file("Neighborhood.geojson")

# Filter all UW-related polygons using S_HOOD
uw_poly = neigh[neigh["S_HOOD"].str.contains("University", case=False, na=False)]

# Reproject to Web Mercator (meters)
uw_poly_3857 = uw_poly.to_crs(epsg=3857)

# Merge polygons + create 750 m buffer
uw_buffer_geom = uw_poly_3857.unary_union.buffer(750)

# Convert buffer back to EPSG:4326 for Folium
uw_buffer_gdf = gpd.GeoDataFrame(geometry=[uw_buffer_geom], crs="EPSG:3857").to_crs(epsg=4326)

uw_buffer_gdf
/tmp/ipykernel_248/3046488903.py:11: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  uw_buffer_geom = uw_poly_3857.unary_union.buffer(750)
Out[57]:
geometry
0 POLYGON ((-122.3086 47.64313, -122.3087 47.643...

Clean incident coordinates

In [58]:
# Clean REDACTED or invalid coordinates
incidents_clean = incidents.copy()
incidents_clean["latitude"]  = pd.to_numeric(incidents_clean["latitude"],  errors="coerce")
incidents_clean["longitude"] = pd.to_numeric(incidents_clean["longitude"], errors="coerce")
incidents_clean = incidents_clean.dropna(subset=["latitude", "longitude"])

# Convert to GeoDataFrame
inc_gdf = gpd.GeoDataFrame(
    incidents_clean,
    geometry=gpd.points_from_xy(incidents_clean["longitude"], incidents_clean["latitude"]),
    crs="EPSG:4326"
).to_crs(epsg=3857)

# Pick incidents INSIDE the UW buffer
inc_in_buffer = inc_gdf[inc_gdf.within(uw_buffer_geom)].copy()

print("Incidents inside 750m UW buffer:", len(inc_in_buffer))
Incidents inside 750m UW buffer: 60

Bar Plot of Incidents Per Day

In [59]:
import matplotlib.pyplot as plt
# Extract dates
inc_in_buffer["date_only"] = pd.to_datetime(inc_in_buffer["report_date_time"]).dt.date

# Bar plot
plt.figure(figsize=(10,5))
inc_in_buffer.groupby("date_only")["offense_id"].count().plot(kind="bar")
plt.title("Incidents per Day within 750m of UW")
plt.xlabel("Date")
plt.ylabel("Incident Count")
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()
No description has been provided for this image

Folium Map (Buffer + Colored Crime Markers)

In [60]:
# Base map centered on UW
m = folium.Map(location=[47.655, -122.303], zoom_start=14)

# Add buffer boundary
folium.GeoJson(
    uw_buffer_gdf.to_json(),
    style_function=lambda x: {"color": "blue", "weight": 2, "fillOpacity": 0}
).add_to(m)

# Reproject incidents for Folium (EPSG:4326)
inc_in_buffer_4326 = inc_in_buffer.to_crs(epsg=4326)

# Marker colors by category
color_map = {
    "PROPERTY CRIME": "red",
    "PERSON": "purple",
    "ALL OTHER": "green"
}

# Add crime markers
for _, row in inc_in_buffer_4326.iterrows():
    folium.CircleMarker(
        location=[row.geometry.y, row.geometry.x],
        radius=4,
        color=color_map.get(row["offense_category"], "gray"),
        fill=True,
        fill_opacity=0.7,
        popup=f"{row['offense_category']}<br>{row['report_number']}"
    ).add_to(m)

m
Out[60]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Analysis¶

Over the 5-day period, the bar plot of incidents within 750 meters of the University of Washington shows noticeable variation in daily counts, with one or two dates standing out as local peaks. This suggests that crime near campus is somewhat clustered in time, likely reflecting patterns in student presence, nightlife, and the timing of classes and activities. Days with higher incident counts may correspond to evenings with heavier foot traffic or busier periods around commercial areas. The folium map further contextualizes these temporal patterns spatially. Within the buffer, incidents are not evenly distributed; they cluster along the campus edges and nearby streets, especially around major arterials and mixed-use blocks. Property crimes appear most frequently and are widely scattered across the buffer, which is consistent with thefts and car prowls in a dense university neighborhood. Person-related incidents are fewer but tend to appear along main routes and near transit stops, where more interpersonal interactions occur. Overall, the analysis indicates that crime around UW is shaped by both daily activity rhythms and the built environment. These patterns can help inform targeted patrols, lighting improvements, and safety outreach in the areas where students and residents are most active.

Autoplay Heatmap with Timestamp¶

Picking a Date and Prepping the Data

In [61]:
df = incidents.copy()

# Clean coordinates (handles 'REDACTED' etc.)
df["latitude"]  = pd.to_numeric(df["latitude"],  errors="coerce")
df["longitude"] = pd.to_numeric(df["longitude"], errors="coerce")
df = df.dropna(subset=["latitude", "longitude"])

# 3. Make a pure date + hour column
df["report_date_time"] = pd.to_datetime(df["report_date_time"])
df["date_only"] = df["report_date_time"].dt.date
df["hour"] = df["report_date_time"].dt.hour

# The date with the most incidents:
date_counts = df["date_only"].value_counts()
chosen_date = date_counts.idxmax()
print("Using date:", chosen_date)

df_day = df[df["date_only"] == chosen_date].copy()

print("Incidents on chosen date:", len(df_day))
df_day.head()
Using date: 2025-11-06
Incidents on chosen date: 192
Out[61]:
report_number report_date_time offense_id offense_date nibrs_group_a_b nibrs_crime_against_category offense_sub_category shooting_type_group block_address latitude ... beat precinct sector neighborhood reporting_area offense_category nibrs_offense_code_description nibrs_offense_code date_only hour
0 2025-325462 2025-11-06 00:11:39 67248678577 2025-11-05T22:47:00.000 A PERSON ASSAULT OFFENSES - 11XX BLOCK OF SUMMIT AVE 47.611338 ... E3 East E FIRST HILL 7623 ALL OTHER Intimidation 13C 2025-11-06 0
1 2025-325470 2025-11-06 00:15:06 67217550137 2025-11-05T23:14:00.000 B SOCIETY DISORDERLY CONDUCT & VAGRANCY VIOLATIONS - 15XX BLOCK OF 3RD AVE W 47.632927 ... Q2 West Q QUEEN ANNE 8623 ALL OTHER Curfew/Loitering/Vagrancy Violations 90B 2025-11-06 0
2 2025-325470 2025-11-06 00:15:06 67217554931 2025-11-05T23:14:00.000 B ANY ALL OTHER - 15XX BLOCK OF 3RD AVE W 47.632927 ... Q2 West Q QUEEN ANNE 8623 ALL OTHER All Other Offenses 90Z 2025-11-06 0
3 2025-325470 2025-11-06 00:15:06 67217557249 2025-11-05T23:14:00.000 A SOCIETY NARCOTIC VIOLATIONS (INCLUDES DRUG EQUIP.) - 15XX BLOCK OF 3RD AVE W 47.632927 ... Q2 West Q QUEEN ANNE 8623 ALL OTHER Drug/Narcotic Violations 35A 2025-11-06 0
4 2025-325482 2025-11-06 00:55:17 67217736640 2025-11-05T23:58:00.000 A SOCIETY NARCOTIC VIOLATIONS (INCLUDES DRUG EQUIP.) - 6TH AVE S / S DEARBORN ST 47.595837 ... K3 West K CHINATOWN/INTERNATIONAL DISTRICT 1499 ALL OTHER Drug/Narcotic Violations 35A 2025-11-06 0

5 rows × 21 columns

Build Hourly Time Slices for HeatMapWithTime

In [62]:
from collections import defaultdict

# Group coordinates by hour for the chosen date
hourly_points = defaultdict(list)

for _, row in df_day.iterrows():
    h = row["hour"]
    hourly_points[h].append([row["latitude"], row["longitude"]])

# Sort hours and build list in time order
hours_sorted = sorted(hourly_points.keys())
heat_data = [hourly_points[h] for h in hours_sorted]

print("Hours in this day:", hours_sorted)
Hours in this day: [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]

Autoplay Heatmap with Timestamp Map¶

In [63]:
from folium.plugins import HeatMapWithTime

# Center map roughly on Seattle
m = folium.Map(location=[47.61, -122.33], zoom_start=12)

# Labels for each time step (show up in the time slider)
time_index = [f"{chosen_date} {h:02d}:00" for h in hours_sorted]

HeatMapWithTime(
    data=heat_data,
    index=time_index,
    auto_play=True,
    max_opacity=0.8,
    radius=12,
    use_local_extrema=False
).add_to(m)

m
Out[63]:
Make this Notebook Trusted to load map: File -> Trust Notebook